首页> 外文OA文献 >Off-Policy General Value Functions to Represent Dynamic Role Assignments in RoboCup 3D Soccer Simulation
【2h】

Off-Policy General Value Functions to Represent Dynamic Role Assignments in RoboCup 3D Soccer Simulation

机译:用于表示动态角色分配的非策略一般值函数   在RoboCup 3D足球模拟中

摘要

Collecting and maintaining accurate world knowledge in a dynamic, complex,adversarial, and stochastic environment such as the RoboCup 3D SoccerSimulation is a challenging task. Knowledge should be learned in real-time withtime constraints. We use recently introduced Off-Policy Gradient Descentalgorithms within Reinforcement Learning that illustrate learnable knowledgerepresentations for dynamic role assignments. The results show that the agentshave learned competitive policies against the top teams from the RoboCup 2012competitions for three vs three, five vs five, and seven vs seven agents. Wehave explicitly used subsets of agents to identify the dynamics and thesemantics for which the agents learn to maximize their performance measures,and to gather knowledge about different objectives, so that all agentsparticipate effectively and efficiently within the group.
机译:在动态,复杂,对抗性和随机环境(例如RoboCup 3D SoccerSimulation)中收集和维护准确的世界知识是一项艰巨的任务。应该在有时间限制的情况下实时学习知识。我们在强化学习中使用了最近引入的非政策梯度下降算法,该算法说明了动态角色分配的可学习知识表示。结果表明,代理商已从RoboCup 2012竞争中学习了与顶级团队的竞争政策,其中包括三对三,五对五,七对七。我们已经明确地使用代理的子集来识别代理学习用于最大化其绩效指标的动态和语义,并收集有关不同目标的知识,从而使所有代理都可以有效地参与到小组中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号